Skip to content

False lock-order-inversion reports for locks taken in a single thread #488

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ramosian-glider opened this issue Sep 1, 2015 · 25 comments

Comments

@ramosian-glider
Copy link
Member

Originally reported on Google Code with ID 81

Consider there're two locks, L1 and L2, and a thread T1 that does the following:
 lock(L1)
 lock(L2)
 unlock(L1)
 lock(L1)
 unlock(L1)
 unlock(L2)

, while other threads may (or may not) acquire and release L1 and L2 (but not both)
at the same time.
Based on the lock acquisition pattern in T1 ThreadSanitizer will report a lock-order-inversion
(potential deadlock), although there's no possibility of a deadlock as long as other
threads do not take both L1 and L2.
The error can even be reported in a single-threaded process, which isn't correct at
all.

Reported by glider@chromium.org on 2014-10-09 09:25:52

@ramosian-glider
Copy link
Member Author

Example: https://code.google.com/p/chromium/issues/detail?id=412097

Reported by glider@google.com on 2014-10-09 09:27:28

@ramosian-glider
Copy link
Member Author

Another one: https://code.google.com/p/v8/source/browse/trunk/test/cctest/test-lockers.cc?r=24476#483

Run() acquires the first lock at line 485 and the second lock at line 499, creating
the L1->L2 edge in the lock acquisition graph.
Then first lock is unlocked at line 510 and locked again at line 517 (when the unlocker
dies), creating the L2->L1 edge.

Reported by glider@google.com on 2014-10-09 09:35:08

@ramosian-glider
Copy link
Member Author

We can probably filter such reports out if we prove that each edge has been created
by the same thread. This may require keeping the list of threads that acquired L_i
while holding L_j for each (L_i, L_j).

Reported by glider@google.com on 2014-10-09 09:37:06

@ramosian-glider
Copy link
Member Author

Reported by dvyukov@google.com on 2014-10-09 09:52:07

  • Status changed: Accepted

@ramosian-glider
Copy link
Member Author

>> We can probably filter such reports out if we prove that each edge has been created
by the same thread.

We shouldn't imho...
What if the locking happens in a worker thread and just because the test is small there
is a single worker (but the large app will have more workers)


Are you sure the pattern in #0 is good coding practice?
I would rather annotate that code (btw, do we have annotations?) than try to fix the
detector. 

Reported by konstantin.s.serebryany on 2014-10-09 18:37:59

@ramosian-glider
Copy link
Member Author

> What if the locking happens in a worker thread and just because the test is small
there is a single worker (but the large app will have more workers)

It can happen with a data race as well, tsan won't report racy accesses that happen
to be in the same thread. And asan won't report a racy UAF if use happens before free.
Our general strategy to-date is to sacrifice some potential true positives to not have
false positives; and require tests to expose bugs -- for any concurrency-related testing
(races, deadlocks, etc) it means that the test must be multi-threaded.
I don't understand why we need to deviate from this line in this case.
Some of these cases involve pre-built libraries, so annotations are not possible. And
suppressions have high chances of suppressing true deadlocks.

Reported by dvyukov@google.com on 2014-10-09 20:04:26

@ramosian-glider
Copy link
Member Author

I previously worked on a Java deadlock detector (JCarder) and provided this as an option.
The default was to report all cycles, but there was an option to only include multi-threaded
cycles in the output. In using jcarder on some large Java projects, I found it helpful
to have both options available.

Reported by todd@cloudera.com on 2015-01-08 04:05:21

@ramosian-glider
Copy link
Member Author

Hi Todd,

Thanks for sharing your experience!

Can you provide an example of where it is useful to report "single-threaded deadlocks"?
My concern is that developers frequently apply the tool to large projects with lots
of external dependencies; in such context the tool ideally works w/o any tuning (potentially
sacrificing some percent of true positives for absence of false positives); or uses
some form of per-function/file annotations if absolutely necessary. I do not really
see how a global setting can be useful in such contexts.

Reported by dvyukov@google.com on 2015-01-19 08:35:21

@ramosian-glider
Copy link
Member Author

I agree that in general, it's best to optimize to avoid false positives.

The reason that the "single-threaded deadlocks" are useful is just what the earlier
commenter mentioned. We run the tool on our unit test suite, and often times unit tests
don't generate enough multi-threaded concurrency to cause thread pools to exercise
multiple threads in places like RPC handler pools.

Another case I've found is that, even though today the code might be safe, it's still
the case that different methods are taking a pair of locks in inverted order. If the
code changes a bit in the future (or users call library methods differently than the
unit tests) then you are likely to hit a deadlock. It's generally a best practice to
have a strict specified order of locks, and deviation from it is a code smell which
many teams would want to eliminate early on.

So, I think it's best to allow a global setting, and also allow suppressions. Then,
in many projects it's feasible to enable the global setting and either fix or suppress
the few false positives that come out. If a project uses lots and lots of dependencies
that exhibit this sort of problem, they can disable the reporting of single-threaded
deadlocks.

Reported by tlipcon on 2015-01-19 08:43:18

@ramosian-glider
Copy link
Member Author

OK, makes sense, thanks!

Reported by dvyukov@google.com on 2015-01-19 08:50:21

@ramosian-glider
Copy link
Member Author

Adding Project:ThreadSanitizer as part of GitHub migration.

Reported by glider@google.com on 2015-07-30 09:21:32

  • Labels added: ProjectThreadSanitizer

@dvyukov
Copy link
Contributor

dvyukov commented Sep 17, 2015

Just got another offline complain about this.
Tsan reports false lock order inversion in the single thread in V8. The second mutex is taken only because there is multi-threaded component internally protected with a mutex, but this component is used on on the single thread.

@kcc
Copy link
Contributor

kcc commented Sep 17, 2015

On Thu, Sep 17, 2015 at 1:35 AM, Dmitry Vyukov notifications@github.com
wrote:

Just got another offline complain about this.
Tsan reports false lock order inversion in the single thread in V8.

Why is it "false"?
benign -- most likely. But not false.

Won't suppressions work for this case?

The second mutex is taken only because there is multi-threaded component
internally protected with a mutex, but this component is used on on the
single thread.


Reply to this email directly or view it on GitHub
#488 (comment).

@dvyukov
Copy link
Contributor

dvyukov commented Sep 17, 2015

Single thread either deadlocks or not. There is no "it did not deadlock now, but potentially can". Tsan does this extrapolation, and it is false.

@kcc
Copy link
Contributor

kcc commented Sep 17, 2015

On Thu, Sep 17, 2015 at 9:22 AM, Dmitry Vyukov notifications@github.com
wrote:

Single thread either deadlocks or not. There is no "it did not deadlock
now, but potentially can". Tsan does this extrapolation, and it is false.

What if we have a thread pool that in a small test always schedules two
tasks in the same thread,
but in a large system will not?

I don't mind a flag that will ignore such reports.
By default, these reports should be printed with some explanation why and a
mention of the flag.

(I may not have time to implement it any time soon :(


Reply to this email directly or view it on GitHub
#488 (comment).

@dvyukov
Copy link
Contributor

dvyukov commented Sep 17, 2015

Should we do the same for races? That is, if a thread accesses a variable twice we can print a bug report saying that what if these accesses will happen in different threads in future. Does not make sense to me. False positives kill trust and experience.

@kcc
Copy link
Contributor

kcc commented Sep 17, 2015

On Thu, Sep 17, 2015 at 9:46 AM, Dmitry Vyukov notifications@github.com
wrote:

Should we do the same for races? That is, if a thread accesses a variable
twice we can print a bug report saying that what if these accesses will
happen in different threads in future.

Does not make sense to me. False positives kill trust and experience.

Well, ok, if you feel that strongly -- do it.


Reply to this email directly or view it on GitHub
#488 (comment).

@dvyukov
Copy link
Contributor

dvyukov commented Sep 17, 2015

That's the most popular complain about deadlock detector.
Nobody complains about race reports in such way. So we can do the same for deadlocks: while it is all in a single thread -- silence, if the cycle involves multiple threads -- report.

More precise semantics: each subsequent edge in mutex cycle must on a different thread than the previous one.

@kcc
Copy link
Contributor

kcc commented Sep 18, 2015

On Thu, Sep 17, 2015 at 9:55 AM, Dmitry Vyukov notifications@github.com
wrote:

That's the most popular complain about deadlock detector.
Nobody complains about race reports in such way. So we can do the same for
deadlocks: while it is all in a single thread -- silence, if the cycle
involves multiple threads -- report.

More precise semantics: each subsequent edge in mutex cycle must on a
different thread than the previous one.

Go for it!
If you want me to do this -- I can't promise anything before mid Oct.


Reply to this email directly or view it on GitHub
#488 (comment).

@morehouse morehouse assigned dvyukov and unassigned google Jun 6, 2018
@morehouse
Copy link
Contributor

@dvyukov: Any progress on this?

@dvyukov
Copy link
Contributor

dvyukov commented Jun 7, 2018

No progress. Still relevant.

@jprotze
Copy link

jprotze commented Feb 10, 2020

This issue is also triggered with file i/o in libgfortran.

@choller
Copy link

choller commented Jul 16, 2020

We have been seeing this on mozilla-central more than once now. I really think the deadlock detector should check if only a single thread is involved and not report in that case.

@SteffenKoehlerVector
Copy link

@dvyukov Is there any progress on this one?

We have such a finding in our code base and were very confused as to why a different lock order inside the same thread would be a potential deadlock. So I would also classify this as a "false positive".

If you argue that you would want this finding to be able to detect such scenarios in test suites that actually don't run into that issue, I would say your test suite has test gaps -> you can't complain about a test suite not detecting issues if you did not test the functionality (e.g. does my thread pool work if I have more than just a single worker).

At the end, the sanitizer is supposed to analyze a specific binary at hand. If that binary suffers from an issue, ideally find that and report it. If this binary does not suffer from the issue, nothing should be reported.

@dvyukov
Copy link
Contributor

dvyukov commented Sep 20, 2022

no progress done/planned

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants